Extending Relational Algebra to express one-to-many data transformations

نویسندگان

  • Paulo Carreira
  • Helena Galhardas
  • Antónia Lopes
  • João L. M. Pereira
چکیده

Application scenarios such as legacy-data migration, ETL processes, data cleaning and data-integration require the transformation of input tuples into output tuples. Traditional approaches for implementing these data transformations enclose solutions as Persistent Stored Modules (PSM) executed by an RDBMS or transformation code using a commercial ETL tool. Neither of these solutions is easily maintainable or optimizable. To take advantage of the optimization capabilities of RDBMSs, data transformations are often expressed as relational queries. However, the limited expressive power of relational query languages like SQL hinder this approach. In particular, an important class of data transformations that produce several output tuples for a single input tuple cannot be expressed as a relational query. In this paper, we present the formal definition of a new operator named data mapper operator as an extension to the relational algebra to address this important class of data transformations. We demonstrate that relational algebra extended with the mapper operator is more expressive than standard relational algebra. Furthermore, we investigate several properties of the operator and supply a set of algebraic rewriting rules that enable the logical optimization of expressions that combine standard relational operators with mappers and present their proofs of correctness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the performance of one-to-many data transformations

Relational Database Systems often support activities like data warehousing, cleaning and integration. All these activities require performing some sort of data transformations. Since data often resides on relational databases, data transformations are often specified using SQL, which is based on relational algebra. However, many useful data transformations cannot be expressed as SQL queries due...

متن کامل

Performance Analysis of One-to-Many Data Transformations

Relational Database Systems often support activities like data warehousing, cleaning and integration. All these activities require performing some sort of data transformations. Since data often resides on relational databases, data transformations are often specified using SQL, which is based of relational algebra. However, many useful data transformations cannot be expressed as SQL queries due...

متن کامل

Data Mapper: An Operator for Expressing One-to-Many Data Transformations

Transforming data is a fundamental operation in application scenarios involving data integration, legacy data migration, data cleaning, and extract-transform-load processes. Data transformations are often implemented as relational queries that aim at leveraging the optimization capabilities of most RDBMSs. However, relational query languages like SQL are not expressive enough to specify an impo...

متن کامل

One-to-many data transformations through data mappers

The optimization capabilities of RDBMSs are turning them attractive for executing data transformations. However, despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produce several output tuples for a single input tuple cannot be expressed in that way. To overcome this limitation, we propose to extend Rel...

متن کامل

Extending the Relational Algebra with the Mapper Operator

Application scenarios such as legacy data migration, Extract-TransformLoad (ETL) processes, and data cleaning require the transformation of input tuples into output tuples. Traditional approaches for implementing these data transformations enclose solutions as Persistent Stored Modules (PSM) executed by an RDBMS or transformation code using a commercial ETL tool. Neither of these is easily main...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005